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GPGPU : general purpose comput a tion on gra phics h ardware 

David Luebke, Mark Harris, Jens Kruger, Tim Puree!!, Naga Govindaraju, Ian Buck, Cliff 

Woolley, Aaron Lefohn 

August 2004 ACM SIGGRAPH 2004 Course Notes SIGGRAPH '04 

Publisher: ACM Press 

Full text available: '^pdf(63.Q3 MB) Additional Information: Ml cltatjon, abstract, citings 

The graphics processor (GPU) on today's commodity video cards has evolved into an 
extremely powerful and flexible processor. The latest graphics architectures provide 
tremendous memory bandwidth and computational horsepower, with fully programmable 
vertex and pixel processing units that support vector operations up to full IEEE floating 
point precision. High level languages have emerged for graphics hardware, making this 
computational power accessible. Architecturally, GPUs are highly parallel s ... 



Trace-driven mennory simulation: a survey 
Richard A. Uhlig, Trevor N. Mudge 

June 1997 ACM Computing Surveys (CSUR), volume 29 issue 2 
Publisher: ACM Press 

Additional Information: full citation , abstract , references, citings , index 
terms , review 



Full text available: 



As the gap between processor and memory speeds continues to widen, methods for 
evaluating memory system designs before they are implemented in hardware are 
becoming increasingly important. One such method, trace-driven memory simulation, has 
been the subject of intense interest among researchers and has, as a result, enjoyed 
rapid development and substantial improvements during the past decade. This article 
surveys and analyzes these developments by establishing criteria for evaluating trac ... 

Keywords: TLBs, caches, memory management, memory simulation, trace-driven 
simulation 



3 Intelli g ent database cachin g through the use of paa e-answers and paae-traces 

Nabil Kamel, Roger King 

December 1992 ACM Transactions on Database Systems (TODS), volume 17 issue 4 
Publisher: ACIVI Press 

Additional Information: full citation , abstract , references , eitings, index 
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Full text available: 'g| pdf(3.08 MB) terms 

In this paper a new method to improve the utilization of main memory systems is 
presented. The new method is based on prestoring in main memory a number of query 
answers, each evaluated out of a single memory page. To this end, the ideas of page- 
answers and page-traces are formally described and their properties analyzed. The query 
model used here allows for selection, projection, join, recursive queries as well as 
arbitrary combinations. We also show how to apply the approach under update ... 

Keywords: artificial intelligence, databases, page access 



4 Application-spe cific optinnizations: Two-level mapping based cache index selection Q 
^ for packet forwardin g en g ines 
Kaushik Rajan, R. Govlndarajan 

September 2006 Proceedings of the 15th international conference on Parallel 
architectures and compilation techniques PACT '06 

Publisher: ACM Press 

Full text available: 'g pdf(802.85 KB ) Additional Information: full citation , abstract , references , index ternns 

Packet forwarding is a mennory-intensive application requiring multiple accesses through a 
trie structure. The efficiency of a cache for this application critically depends on the 
placement function to reduce conflict misses. Traditional placement functions use a one- 
level mapping that naively partitions trie-nodes into cache sets. However, as a significant 
percentage of trie nodes are not useful, these schemes suffer from a non-uniform 
distribution of useful nodes to sets. This in turn results i ... 

Keywords: cache architectures, network processors 



Exploiting perception in hig h-fidelit y virt ual environments: E xpl o iting p e rce ption in 
hig h -fide lity virtual environments 

Additional presentations from the 24th course ar e available on the citation 
page 

Mashhuda Glencross, Alan G. Chalmers, Ming C. Lin, Miguel A. Otaduy, Diego Gutierrez 
July 2006 ACM SIGGRAPH 2006 Courses SIGGRAPH '06 
Publisher: ACM Press 

Full text available: g.pdf(M7_MBia Additional Information: full citation , abstract, references 
mov(68:6 MIN) 

The objective of this course is to provide an introduction to the issues that must be 
considered when building high-fidelity 3D engaging shared virtual environments. The 
principles of human perception guide important development of algorithms and 
techniques in collaboration, graphical, auditory, and haptic rendering. We aim to show 
how human perception is exploited to achieve realism in high fidelity environments within 
the constraints of available finite computational resources.In this course w ... 

Keywords: collaborative environments, haptics, high-fidelity rendering, human-computer 
interaction, multi-user, networked applications, perception, virtual reality 



^ Parallel execution of prolo g prog rams; a surve y 

^ Gopal Gupta, Enrico Pontelli, Khayri A.M. Ali, Mats Carlsson, Manuel V. Hermeneglldo 
^ July 2001 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 23 Issue 4 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citing s, index 
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Full text available: 'gj pdfd.QS MB) terms 

Since the early days of logic programming, researchers in the field realized the potential 
for exploitation of parallelism present in the execution of logic programs. Their high-level 
nature, the presence of nondeterminism, and their referential transparency, among other 
characteristics, mal<e logic programs interesting candidates for obtaining speedups 
through parallel execution. At the same time, the fact that the typical applications of logic 
programming frequently involve irregular computatio ... 

Keywords: Automatic parallelization, constraint programming, logic programming, 
parallelism, prolog 



7 Cache coherence in lar g e-scale shared-nriemory multiprocessors: issues and 

^ comparisons 
^ David J. Lilja 

September 1993 ACM Computing Surveys (CSUR), Volume 25 issue 3 
Publisher: ACM Press 

Full text available: 'p!|pdf (3.12 MB) Additional Information: full citation , references , citin gs, index terms 



A propo sa l for a new hardware cache monit orin g architectur e | 
l^iartin Schuiz, Jie Tao, Jurgen Jeitner, Wolfgang Karl 

June 2002 ACM SIGPLAN Notices , Proceedings of the 2002 woriishop on Memory 

system performance MSP '02, volume 38 issue 2 supplement 
Publisher: ACM Press 

Full text available: 'g[ pdf(1.23 MB ) Additional Information: full citation , abstract , references , citing s 

The analysis of the mennory access betiavior of applications, an essential step for a 
successful cache optimization, is a complex task. It needs to be supported with 
appropriate tools and monitoring facilities. Currently, however, users can only rely on 
either simulation based approaches, which deliver a large degree of detail but are 
restricted in their applicability, or on hardware counters embedded into processors, which 
allow to keep track of very few, mostly global events and hence only provi ... 

LIRS: an efficient low inter-reference recency set replacement policy to improve | 
buffer cache performance 
Song Jiang, Xiaodong Zhang 
June 2002 ACM 5IGMETRICS Performance Evaluation Review , Proceedings of the 
2002 ACM SIGMETRICS international conference on Measurement and 
modeling of computer systems SIGMETRICS '02, volume 30 issue i 
Publisher: ACM Press 

Full text available: 'g) pdf ( 290.24 KB ) Additional Information: full citation , abstract , references , citings 

Although LRU replacement policy has been comnnonly used in the buffer cache 
management, it Is well known for its Inability to cope with access patterns with weak 
locality. Previous work, such as LRU-K and 2Q, attempts to enhance LRU capacity by 
making use of additional history information of previous block references other than only 
the recency information used In LRU. These algorithms greatly Increase complexity and/or 
can not consistently provide performance improvement. I^any recently proposed ... 

D ynamic hot data stream prefetchin g for g eneral- pur pose prog rams 
^ Trishul M. ChllimbI, Martin Hirzel 

N/ May 2002 ACM SIGPLAN Notices , Proceedings of the ACI^ SIGPLAN 2002 Conference 
on Programming language design and implementation PLDI '02, volume 37 

Issue 5 
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Publisher: ACM Press 

Full text available* "PI Ddf(21 0 85 KB) A^^'*'^"^' Information: full citation , abstract , references , citings, index 
u e aval a terms 

Prefetching data ahead of use has the potential to tolerate the growing processor- memory 
performance gap by overlapping long latency memory accesses with useful computation. 
While sophisticated prefetching techniques have been automated for limited domains, 
such as scientific codes that access dense arrays in loop nests, a similar level of success 
has eluded general-purpose programs, especially pointer-chasing codes written in 
languages such as C and C++. We address this problem by describing ... 

Keywords: data reference profiling, dynamic optimization, dynamic profiling, memory 
performance optimization, prefetching, temporal profiling 



A predicat e-based cachin g scheme for client-server database architectures 
Arthur M. Keller, Julie Basu 

January 1996 The VLDB Journal — The International Journal on Very Large Data 

Bases, volume 5 Issue 1 
Publisher: Springer-Verlag New Yorl^. Inc. 

Full text available: 'g| pdf ( 162.80 KB) Additional Information: full citation , abstract , citings, index terms 

We propose a new client-side data-caching scheme for relational databases with a central 
server and multiple clients. Data are loaded into each client cache based on queries 
executed on the central database at the server. These queries are used to form predicates 
that describe the cache contents. A subsequent query at the client may be satisfied in its 
local cache if we can determine that the query result is entirely contained in the cache. 
This issue is called cache completeness. A separ ... 

Keywords: Cache completeness, Cache currency, Caching, Multiple clients, Relational 
databases 



'1 2 A parallel , incrennental. mostly concurrent garbage collector for servers 

Katherine Barabash, Ori Ben-Yitzhak, Irit Goft, Elliot K. Kolodner, Victor Leikehman, Yoav 
Ossia, Avi Owshanko, Erez Petrank 

November 2005 ACM Transactions on Programming Languages and Systems 

(TOPLAS), Volume 27 Issue 6 
Publisher: ACM Press 

I- II * * -i ui & ^r/coo cn i^D\ Additional Information: full citation , abstract , references , citings, index 
Full text available: to pdf(D 83.50 KB) . 

^ terms 

Multithreaded applications with multigigabyte heaps running on modern servers provide 
new challenges for garbage collection (GC). The challenges for "server-oriented" GC 
include: ensuring short pause times on a multigigabyte heap while minimizing throughput 
penalty, good scaling on multiprocessor hardware, and keeping the number of expensive 
multicycle fence instructions required by weak ordering to a minimum. We designed and 
implemented a collector facing these demands building on th ... 

Keywords: Garbage collection, JVM, concurrent garbage collection 



Collision detection and proximity queries 
^ Sunil Hadap, Dave Eberle, Pascal Volino, Ming C. Lin, Stephane Redon, Christer Ericson 
August 2004 ACM SIGGRAPH 2004 Course Notes SIGGRAPH '04 
Publisher: ACM Press 

Full text available: 'p!|pdf (11.22 MB) Additional Information: full citation , abstract 
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This course will primarily cover widely accepted and proved methodologies in collision 
detection. In addition more advanced or recent topics such as continuous collision 
detection, ADFs, and using graphics hardware will be introduced. When appropriate the 
methods discussed will be tied to familiar applications such as rigid body and cloth 
simulation, and will be compared. The course is a good overview for those developing 
applications in physically based modeling, VR, haptics, and robotics. 

S pecial issue: Al in engineering | 
^ D. Sriram, R. Joobbani 
^ April 1985 ACM SIGART Bulletin, issue 92 

Publisher: ACM Press 

Full text available: pdf(8.79 MB) Additional Information: full citation , abstract 

The papers in this special issue were compiled from responses to the announcement in 
the July 1984 issue of the SIGART newsletter and notices posted over the ARPAnet. The 
interest being shown in this area is reflected in the sixty papers received from over six 
countries. About half the papers were received over the computer network. 

15 Introduction to real-time ray tracin g : The RTRT core | 
^ Ingo Wald 

July 2005 ACM SIGGRAPH 2005 Courses SIGGRAPH "05 

Publisher: ACM Press 

Full text available: '^ pdf(1.13 MB) Additional Information: full citation , abstract , references 

The overall design decisions of the RTRT/OpenRT framework are described in detail in 
[Wald04]. To summarize the most important points, we have chosen to only support 
triangles, to exploit SIMD extensions in a data-parallel way, to optimize for memory and 
caches, and to use BSP trees as an acceleration structure. In this chapter, we are now 
going to discuss the actual algorithms and implementation of these topics In more detail. 

Data cache mana g ement usin g frequency-based replacennent | 
John T. Robinson, Murthy V. Devarakonda 

April 1990 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 
1990 ACM SIGMETRICS conference on Measurement and modeling of 
computer systems SIGMETRICS '90, volume 18 issue i 
Publisher: ACM Press 

Full text available: ■S3i)df(991 .05 KB) Additional Information: full citation, .abstract, references. Mings. ind.ex 
lj£j4-._.v / terms 

We propose a new frequency-based replacement algorithnn for managing caches used for 
disk blocks by a file system, database management system, or disk control unit, which we 
refer to here as data caches. Previously, LRU replacement has usually been used for such 
caches. We describe a replacement algorithm based on the concept of maintaining 
reference counts in which locality has been ''factored out". In this algorithm replacement 
choices are made using a combination of reference f ... 

'1 7 Executin g compressed programs on an embedded RISC architecture | 
Andrew Wolfe, Alex Chanin 

December 1992 ACM SIGMICRO Newsletter , Proceedings of the 25th annual 

international symposium on Microarchitecture MICRO 25, volume 23 issue 
1-2 

Publisher: IEEE Computer Society Press, ACM Press 

Full text available: 'gj pdfd.SS MB) Additional Information: full citation , references , citings . Index terms 
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Yannis Kotidis, Nick Roussopoulos 

December 2001 ACM Transactions on Database Systems (TODS), volume 26 issue 4 
Publisher: ACM Press 

Full text available:'Plpdf( 892.57 KB) Additional Information: full citation , abstract, references , citings, index 
i^.,^ terrns, review 

Materialized aggregate views represent a set of redundant entities in a data wareliouse 
that are frequently used to accelerate On-Line Analytical Processing (OLAP). Due to the 
connplex structure of the data warehouse and the different profiles of the users who 
submit queries, there is need for tools that will automate and ease the view selection and 
management processes. In this article we present. Dynal^at, a system that manages 
dynamic collections of materialized aggregate views in a data warehous ... 

Keywords: Data cube, OLAP, data warehousing, materialized views 



19 The automatic improvennent of locality in stora g e systems 
^ Windsor W. Hsu, Alan Jay Smith, Honesty C. Young 

>^ November 2005 ACM Transactions on Computer Systems (TOCS), volume 23 issue 4 
Publisher: ACIVI Press 

Full text available: 'g[pdf(2.5 8 MB ) Additional Infomriation: full citat ion, abstract , references , index terms 

Disk I/O is increasingly the performance bottleneck in computer systems despite rapidly 
increasing disk data transfer rates. In this article, we propose Automatic Locality- 
Improving Storage (ALIS), an introspective storage system that automatically reorganizes 
selected disk blocks based on the dynamic reference stream to Increase effective storage 
performance. ALIS Is based on the observations that sequential data fetch is far more 
efficient than random access, that improving seek distances prod ... 

Keywords: Data layout optimization, block layout, data reorganization, data 
restructuring, defragmentation, disk technology trends, locality improvement, prefetching 



20 I/ O reference behavior of production database workloads and the TPC benchmarks— 

<^ an analysis at the logical level 

^ Windsor W. Hsu, Alan Jay Smith, Honesty C. Young 

March 2001 ACM Transactions on Database Systems (TODS), volume 26 issue i 

Publisher: ACM Press 

I- II* * I ui is^ MfiiZ AnKAo\ Additional Infonriation: full dtation, abM^ract, rMe^^ 

Full text available: "Bj pdf(5. 42 MB) 

^ iQP^g 

As improvements in processor performance continue to far outpace improvements in 
storage performance, I/O is increasingly the bottleneck in computer systems, especially in 
large database systems that manage huge amoungs of data. The key to achieving good 
I/O performance is to thoroughly understand its characteristics. In this article we present 
a comprehensive analysis of the logical I/O reference behavior of the peak 
productiondatabase workloads from ten of the world's largest corporatio ... 

Keywords: I/O, TPC benchmarks, caching, locality, prefetching, production database 
workloads, reference behavior, sequentiality, workload characterization 
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